Bayesian hidden mark interaction model for detecting spatially variable genes in imaging-based spatially resolved transcriptomics data

Recent technology breakthroughs in spatially resolved transcriptomics (SRT) have enabled the comprehensive molecular characterization of cells whilst preserving their spatial and gene expression contexts. One of the fundamental questions in analyzing SRT data is the identification of spatially variable genes whose expressions display spatially correlated patterns. Existing approaches are built upon either the Gaussian process-based model, which relies on ad hoc kernels, or the energy-based Ising model, which requires gene expression to be measured on a lattice grid. To overcome these potential limitations, we developed a generalized energy-based framework to model gene expression measured from imaging-based SRT platforms, accommodating the irregular spatial distribution of measured cells. Our Bayesian model applies a zero-inflated negative binomial mixture model to dichotomize the raw count data, reducing noise. Additionally, we incorporate a geostatistical mark interaction model with a generalized energy function, where the interaction parameter is used to identify the spatial pattern. Auxiliary variable MCMC algorithms were employed to sample from the posterior distribution with an intractable normalizing constant. We demonstrated the strength of our method on both simulated and real data. Our simulation study showed that our method captured various spatial patterns with high accuracy; moreover, analysis of a seqFISH dataset and a STARmap dataset established that our proposed method is able to identify genes with novel and strong spatial patterns.


Introduction
Recent advancements in spatially resolved transcriptomics (SRT) technology have fundamentally transformed our capacity to study cellular behavior at a molecular level, while preserving their spatial and gene expression contexts.This technological leap has opened new avenues for exploring complex biological systems at unprecedented levels of detail and accuracy.Efremova et al. (2020) and Liao et al. (2021) found that the positional context of gene expression is important to understanding tissue functionality and pathology changes, which highlights the pivotal role of SRT techniques.Broadly, SRT technologies are categorized into sequencing-based and imaging-based methods based on differences in RNA profiling: sequencing-based and imagingbased.Spatial transcriptomics, one of the next-generation sequencing (NGS) technologies, resolves gene expression profiles at a resolution of 100 μm.Spatial transcriptomics implemented by the 10x Visium platform achieved 55 μm resolution, allowing for a detailed study of spatial organization.On the other hand, imagingbased technologies have revolutionized the field of transcriptomics by achieving single-cell resolution, with prominent examples such as sequential fluorescence in situ hybridization seqFISH (Ståhl et al., 2016), seqFISH+ (Eng et al., 2019), and multiplexed error-robust FISH (MERFISH) (Moffitt et al., 2018).Datasets profiled by SRT technologies have inspired the exploration of the spatial organization of gene expression within tissues.Cohorts with single-cell resolution motivate more biological analysis, such as cell-cell communication analysis via CellChat (Jin et al., 2021), characterization of ligand-receptor interactions between different cell types (Efremova et al., 2020) and so on.Hence, spatial information provided by imaing-based SRT data makes it more feasible to identify and quantify gene expression in specific regions of a tissue.
One of the most interesting questions arising along the development of SRT techniques is the identification of spatially variable genes (SVGs) whose expressions display spatially correlated patterns.Studies have found that SVGs demarcate clear spatial substructure, and are relevant to disease progression (Svensson et al., 2018;Hu et al., 2021).Various methods across different fields have been developed to identify SVGs, each capitalizing on distinct strengths.Trendsceek (Edsgärd et al., 2018) is built upon marked point processes to rank and evaluate the spatial pattern of each gene; however, it yields unsatisfactory performance (Sun et al., 2020) and is inhibited from scaling to large-scale data due to the expensive computational cost (Sun et al., 2020;Dries et al., 2021).SpatialDE (Svensson et al., 2018), SPARK (Sun et al., 2020), and BOOST-GP (Li et al., 2021) capture spatial correlation patterns by utilizing the Gaussian process.Specifically, SpatialDE models normalized gene expression levels via a multivariate Gaussian model with a spatial covariance function characterizing linear and periodic spatial patterns.SPARK models raw counts using a generalized linear spatial model with different periodic and Gaussian kernels.BOOST-GP models raw counts with a Bayesian zero-inflated negative binomial (ZINB) model with a squared exponential kernel covariance matrix.However, the performance of these kernel-based methods relies heavily on the resemblance between the underlying spatial expression patterns and the predefined kernel functions (Jiang et al., 2022).BinSpect (Dries et al., 2021), a non-model based method, identifies SVGs through statistical enrichment analysis of spatial network neighbors with binarized gene expression states.SpaGCN (Hu et al., 2021) defines SVGs as genes as those exhibiting differential expression among spatial domains and employs a deep learning model to identify these domains.BOOST-MI (Jiang et al., 2022) utilizes an energy-based modified Ising model to identify SVGs exclusively for sequencingbased SRT data, with the limitation that the spatial position of measured spots needs to be on the regular lattice grid.Compared to kernel-based models, energy-based interaction characterization enables the detection of broader types of gene spatial expression patterns.
As mentioned, gene expressions resolved by imaging-based SRT have single-cell resolution, which potentially unearths more biological insights.We aimed to develop a model that can identify SVGs with higher accuracy to be used on data from imaging-based SRT platforms, and uncover more biological mechanisms.Drawing inspiration from the success of energybased models over kernel-based approaches (Jiang et al., 2022), we propose a novel joint Bayesian framework model, BOOST-HMI.This model utilizes a recently proposed energy function for mark interaction (Li et al., 2019).In particular, we adopt a ZINB mixture model to handle the unique data characteristics of SRT, including excess zeros and unknown mean-variance structures.Additionally, our method introduces a latent binary gene expression indicator to distinguish high and low expression states at the cellular level, thereby enhancing the model's robustness against noise.Unlike BOOST-MI, our proposed BOOST-HMI is not constrained by the spatial distribution requirements of sequencing-based SRT data, making it versatile for imaging-based datasets where cells are randomly distributed.Furthermore, BOOST-HMI directly models raw counts within a joint Bayesian framework, addressing uncertainties associated with dichotomization.Our comprehensive simulation studies, covering various scenarios, demonstrate the superior accuracy of BOOST-HMI in detecting SVGs.We also applied our model to two real datasets: a mouse hippocampus seqFISH dataset and a mouse visual cortex STARmap dataset, where it successfully detected more spatial patterns and layer-specific SVGs, potentially unveiling novel biological insights.
The rest of the paper is organized as follows: section 2 introduces our ZINB mixture model for identifying SVGs from SRT count data and discusses the extension of the Bayesian mark interaction model to SRT data.In section 3, we describe the Markov chain Monte Carlo (MCMC) algorithms for posterior sampling and the resulting posterior inference.Finally, section 4 presents our method's performance on simulated and real SRT datasets, compared with five other methodologies.

Methods
In this section, we introduce a ZINB mixture model for directly modeling the imaging-based SRT count data, and a hidden mark interaction model to quantify the spatial dependency of latent binary gene expression levels.The schematic diagram of BOOST-HMI is shown in Figure 1, and the graphical and hierarchical representations are presented in Supplementary Figure S5 and Supplementary Table S1, respectively, in the Supplementary Material.
Before introducing the models, we summarize the SRT data notations as follows.We denote the gene expression raw counts as a n-by-p matrix Y with each entry y ij ∈ N denoting the number of read counts for gene j at cell i.Every column y •j in Y denotes the expression counts across all measured cells for gene j, while each row y i• denotes the counts of all genes on cell i where i = 1, . . ., n, j = 1, . . ., p.As to geospatial profile, let a n-by-2 matrix T be the matrix for the spatial location of cells, where each row t i• (t i1 , t i2 ) ∈ R 2 records the coordinates of cell i in the 2D Cartesian plane.

A ZINB mixture model for modeling gene expression count data
For the majority of SRT techniques, gene expression measurements obtained are in the form of counts.In the context of for imaging-based SRT platforms, gene expressions are collected as the count of barcoded mRNA corresponding to a particular transcript within a single cell (Zhao et al., 2022).Due to the characteristics of these measurements, observed count data often suffers from over-dispersion and zero-inflation.The negative binomial distribution can effectively account for the meanvariance relationship in the raw counts.Moveover, the gene expression count matrix Y is characterized by an inflated number of zeros, resulting from imaging sensitivity and hybridization efficiency (Zhao et al., 2022); therefore, we generalized the negative binomial (NB) model to the ZINB model to account for both the over-dispersion and the high sparsity level, i.e., where parameter π i ∈ [0, 1] represents the false zero proportion measured on cell i.NB(], ϕ) denotes a negative binomial distribution with mean ] and dispersion parameter ϕ.Consequently, the variance is ] + ] 2 /ϕ.1/ϕ controls the overdispersion scaled by the square of mean ] 2 .The probability mass function is Γ(y+ϕ) y!Γ(ϕ) ( ϕ ]+ϕ ) ϕ ( ] ]+ϕ ) y .Given our particular circumstances, the NB mean is decomposed into two multiplicative effects, the size factor s i and the expression level ] ij .The collection of s (s 1 , . . ., s n ) ⊤ reflects nuisance effects across cells.We follow SPARK Sun et al. (2020), setting s i proportional to the summation of the total number of read counts across all genes for cell i, and combine it with a constraint of i s i = 1, which gives s i = j y ij / i j y ij .By setting the constraint for s i 's, we avoid the identifiability problem between s i 's and ] ij 's.
To denoise the relative expression levels, we aim to partition ] ij into two groups by introducing the ZINB mixture model.Dichotomization has been widely applied as a step in the analysis of SRT data.BinSpect (Dries et al., 2021) and BOOST-MI (Jiang et al., 2022) discretize the normalized expression levels for each gene into two groups for more robust SVG detection results.Here, we introduce a latent binary gene expression level indicator vector ξ •j to denote the dichotomized expression profiles of each gene j.If ξ ij = 1, gene j is highly expressed at cell i, and if ξ ij = 0, gene j has low expression at cell i.A mixture model is suggested to allow different ZINB model parametrizations for high and low expression levels for gene j, in which we assume the raw expression count y ij is generated one of two independent ZINB distributions with different means given the underlying binary indicator ξ ij , where μ 1j and μ 0j , denote the group mean of read count for highly and lowly expressed genes, respectively.To guarantee that the mean expression level for a highly expressed gene is higher than a lowly expressed gene, we set a constraint for NB distribution mean across two expression levels: μ 1j > μ 0j .ϕ 1j and ϕ 0j represent the dispersion parameters of the NB model for the highly and lowly expressed gene, respectively.
To complete the model, we specify the following prior distributions: μ 0j , μ 1j ~Ga (a μ , b μ ), s.t.μ 1j > μ 0j > 0 and ϕ 0j , ϕ 1j ~Ga (a ϕ , b ϕ ).For prior distribution setting, small values such as a μ = a ϕ = 0.01 and b μ = b ϕ = 0.01 are recommended to impose minimal information (Gelman, 2006).To create an environment conducive to model fitting, we introduce a latent variable η ij to indicate whether a zero count y ij is from the zero or NB component in Eq. 1, and impose a Bernoulli prior η ij ~Bern(π i ), which can be further relaxed by formulating a Beta(a π , b π ) hyperprior on π i , leading to a Beta-Bernoulli prior for η ij with expectation a π /(a π + b π ).For the The schematic diagram of the proposed BOOST-HMI model.

A brief review of the Bayesian mark interaction model
Marked point interaction models are statistical models for spatial point pattern analysis with applications across diverse fields such as geostatistics, ecology, material physics, and so on (Edsgärd et al., 2018;Li et al., 2019).These models are designed to study the interactions among points with numerical or categorical marks in a planar region.Marked point models are receiving greater and greater focus in biology: for instance, Trendsceek applies the marked point process to identify SVGs (Edsgärd et al., 2018).The Bayesian mark interaction model, proposed by Li et al. (2019), is a full Bayesian model that characterizes spatial correlations among cell types from tumor pathology images.
Let (t i1 , t i2 ) ∈ R 2 , i 1, . . ., n be the x-and y-coordinates of point i.Let G = (V, E) denote an interaction network with a finite set of points V and a set of direct interactions E. In the introduced Bayesian mark interaction model, we assume points have categorical marks.Here, we denote ξ (ξ 1 , . . ., ξ n ) ⊤ as the categorical marks of n points on the plane.
The Bayesian mark interaction model formulates the pattern of marks ξ via the energy function, which is first introduced in statistical mechanics.The energy function has terms to account for both firstand second-order properties of the marked point data.Specifically, to model the interaction energy between two points, an exponential decay function with respect to the distance between the two points is used.Moreover, the Bayesian mark interaction model neglects interaction terms of point pairs from E when the corresponding distance is beyond a threshold c.Consequently, the model focuses on a sparse network G′ = (V, E′), where E′ includes edges joining pairs of points i and i′ with distance d ii' < c.The setting of the distance threshold is added to avoid the high computation cost incurred when summing over n data points and (n − 1)n/2 interacting pairs of points with large n.Then, the potential energy of G′ is measured by two addictive terms, where q, q′ ∈ {1, . . ., Q} are the categories of marks.ω (ω 1 , . . ., ω Q ) ⊤ and Θ [θ qq′ ] Q×Q are defined as firstand secondorder intensities.(i ~i′) denotes the collection of interacting pairs of cells in G′. d ii′ (t i1 − t i′1 ) 2 + (t i2 − t i′2 ) 2 denotes the Euclidean distance between point i and i′.λ is the decay parameter of the distance between two points in the exponential decay function, where a larger λ makes energy diminish quickly with respect to the increase in point pair distance.
By restricting the interaction effect within radius c d , the Bayesian mark interaction model defines a local energy.According to the fundamental Hammersley-Clifford theorem (Clifford, 1990), a probability measure with a Markov property exists if we have a locally defined energy, called a Gibbs measure.This measure gives the probability of observing categorical marks associated with their locations in a particular state.We can write the joint probability on marks ξ as, which is proportional to the exponential of the negative energy of marks ξ calculated by Eq. 3. The denominator is a normalizing constant that needs to sum over the entire space Ξ of marks combination consisting of Q n states, which is intractable even for a small size model.The joint probability (Eq.4) can be considered as the full data likelihood.To interpret the parameters clearly, we write the probability of observing point i having mark category q conditional on its neighborhood configuration, where ξ −i denotes the collection of all marks, except the ith one.Eq. 5 shows that the probability of point i with mark q depends on parameter ω q , θ qq′ , and the decay parameter λ.Parameters in Eq. 5 are interpreted below.Suppose there is no interaction between any two points in the space, i.e., θ qq′ = 0; then, the conditional probability of point i with . Therefore, the model parameter ω q is related to the abundance of points with mark q.Fixing ω as equal values, we obtain the conditional probability of point i with ).The second-order intensity θ qq′ quantifies the dependency of mark q with the nearby points, with mark q′ scaling by the distance decay function.A detailed parameter interpretation is provided by Li et al. (2019).

A hidden mark interaction model for identifying SVGs
In Section 2.1, we describe how our ZINB mixture model is used to convert read counts y •j for each gene j into their corresponding hidden binary states ξ •j .This dichotomization process allows us to represent gene expression in a binary format.We then treat the spatial distribution of cells as a two-dimensional point process, with the binary gene expressions ξ •j serving as the markers of these points.This setup enables us to effectively use the mark interaction model to assess the spatial correlations among these markers.In the context of SVG detection via the energy-based approach such as outlined by Jiang et al. (2022), the core concept involves quantifying the interactions between spots or cells of high and low expression levels for a given gene j, designated by q = 1 and 0, respectively.To streamline the energy function presented in Eq. 5, we only focus on the second-order intensity, θ 12 , hereafter referred to as θ j , while omitting self-interaction terms, θ 11 and θ 22 .In other words, interactions between the neighboring points with the same marks are excluded.This adjustment notably simplifies model complexity, rendering the negative energy function used in BOOST-MI a special case of the proposed BOOST-HMI, assuming λ = 0 and c d is chosen to match the distance between adjacent spots or cells.For simplicity, as outlined in Section 2.2, we treat the decay parameter λ as a predefined hyperparameter λ 0 .Within this framework, the energy function can be expressed as follows: To interpret the model parameters, we provide the conditional probability of observing a high-expression level ξ ij = 1 of gene j at cell i, given the expression levels of other cells for gene j: As introduced in Section 2.2, the model parameters ω j (ω 0j , ω 1j ) ⊤ represent the first-order property, and θ j reflects the second-order property of the spatial distribution of marks.Specifically, model parameters ω 0j and ω 1j are related to the abundance of cells with low and high expression levels of gene j, respectively.From Eq. 7, when ).This means the conditional distribution of ξ ij for any given cell is independent of the states of all other cells, thereby generating a completely random expression pattern indicative of a non-SVG.If the interaction parameter θ j between highly and lowly expressed cells is positive, then Eq. 7 becomes a decreasing function with respect to the number of lowly expressed neighbors i~i′ I (ξ i′j = 0).This implies that a cell i is more likely to be in the highly expressed group when there are fewer low expressed cells in the surrounding area.In other words, a positive θ j suggests that the gene expression level at cell i tends to be the same with the majority of its neighboring cells, leading to a repulsion pattern.Conversely, a negative θ j indicates an attraction pattern, where cells of differing expression levels are more likely to be adjacent.Both the attraction and repulsion patterns are characteristic of SVGs.It is important to recognize that the energy functions used in BOOST-HMI and BOOST-MI differ in their signs.As a result, θ j assumes opposite meanings between these two models.Parameter λ 0 controls the change of interaction strength between a pair of points with respect to their distance.A larger λ 0 causes the interaction between 2 cells to diminish faster, resulting in a smaller interactive neighborhood for each cell.As a hyperparameter, λ 0 needs to be set appropriately to reflect the interaction neighborhood for cells.

Model fitting
In this section, we introduce the MCMC algorithm for model fitting and posterior inference.Our model space consists of (M, Φ, H, Ξ, ω 0 , θ) with the underlying grouped gene expression levels M [μ kj ] 2×p , the dispersion parameters Φ [ϕ kj ] 2×p , the extra zero indicators H [η ij ] n×p , the binary expression level indicators Ξ [ξ ij ] n×p , the first-order intensity parameter ω 0 (ω 01 , . . ., ω 0p ) ⊤ and the interaction parameter θ (θ 1 , . . ., θ p ) ⊤ in the mark interaction model.Each gene is examined independently by BOOST-HMI.We give the full posterior distribution for gene j as, Our primary aim was to infer ω 0j , θ j and ξ •j , which define the Gibbs probability measure based on the local energy function.We provide estimation and inference on first-order intensity ω 0j , which represents the abundance of lowly expressed levels of gene j, and the second-order intensity θ j , which captures the spatial correlation between two expression levels.The estimated latent gene expression level indicator provides a robust estimation of the spatial organization of marks.

MCMC algorithms
We estimate μ 0j , μ 1j , ϕ 0j and ϕ 1j using the random walk Metropolis-Hastings (RWMH) algorithm.η •j and ξ •j are estimated with a Gibbs sampler.The Gibbs probability measure for the distribution of latent gene expression indicator ξ •j in Eq. 7 omits an intractable normalizing constant C(ω 0j , ω 1j , θ j ) , which makes the Metropolis-Hastings algorithm infeasible.For instance, to model a gene expression profile with n = 257 cells, we need to traverse 2 257 ≈ 2.3 × 10 77 different arrangements of ξ for every gene, which is a heavy computational burden.To overcome this issue, we use the double Metropolis-Hastings (DMH) algorithm proposed by Liang et al. Liang (2010) to estimate ω 0j and θ j by canceling the intractable normalizing constant.The DMH is an efficient auxiliary variable MCMC algorithm.In contrast to other auxiliary MCMC algorithms, it does not require drawing the auxiliary variables from a perfect sampler, which usually increases computational cost (Møller et al., 2006).The full details of MCMC algorithms is described in Section S1 of the Supplementary Material.

Posterior inference
Posterior inference of parameters μ 0j , μ 1j , ϕ 0j , ϕ 1j , ω 0j , and θ j is obtained by averaging the MCMC posterior samples after burn-in.We are interested in identifying the SVGs by summarizing the interaction parameter θ.As stated in Section 2.3, investigating whether θ j is positive or negative is of great importance to inferring the spatial expression pattern of gene j.To test if gene j demonstrates an attraction pattern, we applied hypothesis testing M 0 : θ j ≥ 0 versus M 1 : θ j < 0. To test repulsion pattern, the hypothesis testing is M 0 : θ j ≤ 0 versus M 1 : θ j > 0. If there is strong evidence to reject the null hypothesis M 0 , we conclude that gene j is an SVG.The Bayes factor (BF) is computed to infer whether θ j is positive or negative with statistical significance from the MCMC algorithm results.The Bayes factor measures the favorability of M 1 as , for attraction, where u indexes the iteration and U is the total number of iterations after burn-in.The larger the BF j , the more likely gene j is an SVG with an attraction pattern.The smaller the BF j , the more likely gene j is an SVG with a repulsion pattern.
Another important parameter in our model is the latent gene expression level indicator ξ •j .We summarize the posterior distribution of ξ •j via maximum-a-posteriori (MAP) estimates, which is the mode of the posterior distribution.A more comprehensive summary of ξ
Spatial locations of simulated data were from the geospatial profile of the mouse hippocampus dataset field 43 (Shah et al., 2016) with n = 257 cells, which we present in Section 4.2.To generate the expression counts for gene j, the latent gene expression level indicators ξ •j 's were first generated based on Eq. 7 with three different values of ω 0j ∈ {1.4, 1, 0.6} and the fixed value of ω 1j = 1.These three values of ω j correspond to approximately 60%, 50%, and 40% lowly-expressed cells in ξ •j .Additionally, we set four different values of θ j ∈ {−2.5, −1.2, 1.9, 3.2} to generate SVGs with various patterns of attraction or repulsion.These values correspond to strong attraction, weak attraction, weak repulsion, and strong repulsion patterns, respectively.For the non-SVG, θ j = 0 which indicates complete randomness and no spatial correlation.The distance threshold c d and decay parameter λ 0 were set as 0.15 and 20, respectively.We then simulated gene expression data from a ZINB model with three different group-mean ratios R ∈ {2, 5, 10} between high and low expression: where the underlying baseline expression levels β 0 = 10.In the simulation study, size factors s (s 1 , . . ., s n ) ⊤ were generated from log-N (0, 0.2 2 ), and dispersion parameters ϕ 0j , ϕ 1j in the NB model are generated from an exponential distribution Exp (1/10).Further, to imitate high sparsity and account for medium sparsity in real SRT data, we created three sets of sparsity levels, 0%-10%, 10%-20%, and 30%-40%, and generated extra zero parameters η •j 's correspondingly.Extra zeros were randomly selected and imputed into the generated gene expression count data.Thus, we considered three group-mean ratios and three sparsity levels, which is 3 × 3 = 9 scenarios in total.For each scenario, we simulated 30 replicates with p = 100 genes in each replicate, 10 out of which were SVGs.Before estimating the parameters using BOOST-HMI, we specified the prior distributions.Non-informative gamma priors were specified for μ 0j , μ 1j , ϕ 0j and ϕ 1j , i.e., μ 0j ~Ga (a μ , b μ ) and ϕ 0j ~Ga (a ϕ , b ϕ ).We set a μ , b μ , a ϕ , and b ϕ to 0.01, which produced a gamma distribution with mean one and variance 100.Priors for ω 0j and θ j were set to control the gene expression abundance and gene expression pattern, ω 0j ~N(1, τ 2 ω ) and θ j ~N(0, τ 2 θ ).In the simulation study and real data analysis, we set τ ω = 0.5, where the prior distribution of ω 0j indicates that the latent proportion of low expression cells ranges from 1% to 100% with a probability of 95%.τ θ was set to 3.5 such that the prior for θ j guarantees that θ j falls within −6 to 6 with a probability of 92%.For hyperparameters in the energy function, we set the distance threshold c d = 0.15 and expected the relationship of decay parameter λ 0 and c d to be exp (−λ 0 c d ) = 0.05, which specifies the range of exponential decay function exp (−λ 0 d ii′ ) to be [0.05,1] as c d ≥ d ii' ≥ 0; therefore, we set λ 0 as 20 correspondingly.As for the setting of the MCMC algorithm, we implemented BOOST-HMI in a gene-wise fashion.For each gene, we initialized model parameters by randomly drawing from their prior distributions.The MCMC algorithm is iterated U = 10, 000 times after 10,000-iteration burn-in.The algorithm was implemented in R and Rcpp.As mentioned in Section 3.2, BOOST-HMI identifies SVGs based on Bayes Factors (BFs).In our study, we select a BF threshold of 10, which indicates strong evidence in favor of the M 1 (Kass and Raftery, 1995).
We implemented the other five competing methods with their default settings.BOOST-GP selects SVGs using Bayes factors.SpatialDE, BinSpect, SPARK, and SPARK-X, use p-values to select SVGs.To control type-I error rate, the Benjamini-Hochberg (BH) (Benjamini and Hochberg, 1995) procedure was used to adjust p-values from SpatialDE and BinSpect.We specifically avoided adjusting p-values from SPARK and SPARK-X since its raw p-values are calibrated by the Cauchy combination rule (Liu et al., 2019;Sun et al., 2020).For p-values, the threshold was set to 0.05.
Our task is to evaluate the ability of each method to correctly identify underlying SVGs from the simulated dataset, which can be defined as a binary classification problem; therefore, to evaluate the performance of the five methods, we employed two performance metrics for binary classification problems: First, we used the area under the curve (AUC) (Bradley, 1997) of the receiver operating characteristic (ROC) (Fukunaga, 2013).The ROC is a plot of the true positive rate against the false positive rate for different classification thresholds.The AUC is a single value ranging from 0 to 1, with a higher value indicating better classification performance.
Figure 2 displays a boxplot of AUCs calculated by the aforementioned seven methods over 30 replicates across nine scenarios.It clearly suggests that BOOST-HMI achieves superior performance compared to the other methods, especially when there was high sparsity.BinSpect-kmeans and BinSpect-rank showed competitive performance when there was no zero-inflation, i.e., when the sparsity level was between 0% and 10%, regardless of the different group ratios; however, these methods showed decreasing AUCs as sparsity level increased.SPARK and SpatialDE suffered from a limited ability to detect SVGs from low expression variability or high zero-inflation scenarios.Between SPARK and SpatialDE, the simulation study showed that SPARK has better SVG detection power over SpatialDE, which is consistent with the conclusion drawn by Sun et al. (2020) and Jiang et al. (2022).In summary, BOOST-HMI achieved satisfactory performance and is robust against different group expression level ratios and sparsity levels.
The second metric we used is the Matthews correlation coefficient (MCC).MCC is a summary value that examines the binary classification performance under a specific cutoff, i.e., BF or p-value thresholds for our study.It has values ranging from −1 to 1, incorporating true positives, true negatives, false positives and false negatives.A larger MCC value, such as 1, corresponds to an excellent classifier, while a negative MCC indicates a strong disagreement between prediction and observation.Table 1 summarizes the average MCCs obtained in the simulation study across the five methods.The result is consistent with our conclusion from our analysis of the AUC: BOOST-HMI achieved the highest power under high zero-inflation, while all other six methods suffered from the high number of false zeros.In the scenario without zero inflation, BinSpect-kmeans stood out, and BinSpect-rank, SPARK, and BOOST-HMI showed competitive performance in identifying SVGs.
We further expanded our evaluation of BOOST-HMI's effectiveness through comprehensive analyses using simulated data.These investigations, detailed in the Supplementary Material, encompass a scalability test (Section S2), a sensitivity analysis (Section S3), an examination of performance under model mis-specification (Section S4), and an assessment of statistical power and false discovery control (Section S5).

Application to the mouse hippocampus seqFISH data
The mouse hippocampus dataset is a public seqFISH dataset with 21 field replicates collected on a third coronal section (Shah et al., 2016).Following SpatialDE and SPARK protocols, we analyzed the field 43 dataset, which contains p = 249 genes measured on 257 cells with spatial location preserved.Out of Simulation study: The boxplots of AUCs achieved by BOOST-HMI, SPARK, SPARK-X, SpatialDE, BinSpect-kmeans, BinSpect-rank, and BOOST-GP across nine scenarios.1. y ij and y hj are the gene expression count of cell i and cell h, and y j is the mean expression of gene j.Similar to Moran's I, Geary's C measures the spatial similarity or dissimilarity between neighboring cells, and is calculated with the following formula: Geary′s C n − 1 2 i h w ih i h w ih y ij − y hj This paper introduces BOOST-HMI, a novel method for identifying SVGs in imaging-based SRT datasets.By integrating gene expression data with spatial location, BOOST-HMI employs a ZINB mixture model to effectively handle the excessive zeros typical in SRT data.Additionally, it uses a hidden Bayesian mark interaction model to accurately quantify spatial dependencies in gene expressions.
Our approach is adaptable for analyzing sequencing-based SRT data.We validated BOOST-HMI through a simulation study and analysis of two real datasets, demonstrating its effectiveness across various SRT technologies and tissue sections.The simulation results showed that BOOST-HMI is particularly adept at identifying SVGs in data with high sparsity levels, between 30% and 40%.When analyzing the mouse hippocampus seqFISH data, BOOST-HMI achieved comparable results to SPARK, with the identified SVGs exhibiting stronger spatial patterns as quantified by Moran′sI and Geary′sC.Moreover, the SVGs identified were enriched in biologically relevant GO terms, such as smoothened signaling pathways and regulation of neural precursor cell proliferation, offering avenues for further biological investigation.
Further analysis of the mouse visual cortex STARmap dataset revealed that BOOST-HMI can identify SVGs with spatial patterns aligning with the underlying cell structure of the tissue.Additionally, GO enrichment analysis indicated that these SVGs are linked to cellular developmental processes, underscoring the potential for novel biological insights.
While our method assumes homogeneity in spatial patterns across tissue sections, this may not hold true for all cases.Future work will aim to generalize BOOST-HMI to accommodate heterogeneous spatial patterns, enhancing its practicality.Another focus will be on scaling the model to accommodate the growing size of SRT datasets, such as those generated by advanced technologies like Slide-seqV2, which can resolve over 19,600 genes from around 23,000 cells (Stickels et al., 2021).Enhanced scalability will enable BOOST-HMI to analyze datasets from various technologies like HDST, Slide-seqV2, and others, potentially leading to more groundbreaking biological discoveries.

FIGURE 1
FIGURE 1 (u)  ij /U.Then, the latent expression indicator indicates a high expression spot when PPIs are greater than a threshold c p :ξ PPI •j I PPI 1j ≥ c p , . . ., I PPI nj ≥ c p ⊤ .

FIGURE 2
FIGURE 2 expression in the majority of cells.The remaining three unique SVGs, Zic3, Mnat1, and slc17a7 delineated three distinct patterns, which may be related to novel biological mechanisms.Lim et al. (2007) previously highlighted the crucial role of Zic3 in preserving pluripotency in embryonic stem cells, whileHerman and El-Hodiri (2002) demonstrated that mutations in Zic3 are linked to developmental abnormalities such as laterality defects, congenital heart disease, and neural tube defects.In addition to visualizing spatial patterns, we quantified the degree of spatial attraction or repulsion pattern in gene expression across different cells using the spatial autocorrelation tests Moran's I and Geary's C. Moran's I quantifies the spatial clustering or dispersion by standardizing the spatial autocovariance, yielding a correlation coefficient ranging from −1 to 1.A positive Moran's I value corresponds to a spatial clustering pattern where the variable tends to have similar values to its neighboring cells.A Moran's I value close to 0 suggests a random spatial distribution of the data, while a negative value corresponds to a dispersion pattern, where the variable value tends to be dissimilar from its neighbors.To assess the spatial patterns exhibited by the SVGs, Moran's I and Geary's C values were calculated for each SVG identified by at least one of the four methods.Moran's I for each gene j was calculated by the following formula:

FIGURE 3
FIGURE 3 Mouse hippocampus seqFISH data: (A) Spatial distribution of hidden gene expressions ξ MAP of the 12 SVGs detected by both SPARK and BOOST-HMI.(B) Spatial distribution of hidden gene expressions ξ MAP of the five SVGs detected by SPARK only.(C) Spatial distribution of hidden gene expressions ξ MAP of the ten SVGs detected by BOOST-HMI only.(D) Venn diagram of the overlap across SVGs identified by all four methods.(E) Enriched GO terms associated with SVGs detected by BOOST-HMI.(F) Boxplot of Moran's I and Geary's C values for SVGs across the four methods.

FIGURE 4
FIGURE 4 Mouse visual cortex STARmap data: (A) Voronoi diagrams of layer structures and cell type distribution.(B) Spatial distribution of the average hidden gene expressions ξ MAP of the six SVG patterns detected by SPARK.(C) Spatial distribution of the average hidden gene expressions ξ MAP of the five SVG patterns detected by BOOST-HMI.(D) Venn diagram of the overlap across SVGs identified by all four methods.(E) Enriched GO terms associated with SVGs detected by BOOST-HMI.(F) Boxplot of Moran's I and Geary's C values for SVGs across the four methods.